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The  St ructura 1  Analysi  s  o  f  Programming  Languages 


B.  J.  MacLennan 


1.   Introduction 


It  is  common  to  find  articles  in  the  programming  language 
literature  riddled  with  unsupported  claims.  'Words  and  phrases, 
such  as  'better',  'simpler',  'more  structured'  and  'less  error 
prone',  are  used  with  abandon.  If  we  were  selling  aspirin  ani 
made  such  unsupported  claims,  we  would  probably  be  sued.  We 
clearly  need  more  precise  ways  of  measuring  our  languages. 

A  language's  structures  are  some  of  its  most  important 
characteristics.  These  include  the  data  structures:  those 
mechanisms  that  the  language  provides  for  organizing  elementary 
data  values.  They  also  include  the  control  structures,  which 
organize  the  control  flow.  Less  obviously,  they  include  the  name 
structures,  which  partition  and  organize  the  name  space. 

Languages  can  be  compared  relative  to  their  structures  in 
the  data,  control  and  name  domains  (and  others,  such  as  the  syn- 
tactic domain).  To  make  this  comparison  precise,  we  need  a  pre- 
cise method  of  describing  the  structural  properties  of  a 
language.  Further,  this  method  should  be  syntax  independent;  it 
should  "look  through"  the  syntax  of  a  language  to  its  underlying 
structure.   In  the  next  section  we  discuss  a  means  by  which   pro- 
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gramming  language  structures  can  be  described. 

2.   Describing  Structure 

The  number  of  different  structures  that  a  programmer  can  use 
are  essentially  unlimited.  For  instance,  there  are  an  infinite 
number  of  ways  he  can  organize  his  data  or  control  flow.  Since 
programming  languages  are  finite,  there  must  be  some  finite  means 
of  generating  this  infinite  number  of  structures. 

The  means,  of  course,  is  to  have  some  number  of  primitive 
structures  and  some  number  of  constructor  functions  which  take 
existing  structures  and  compose  them  into  new  structures.  For 
instance,  Pascal  data  types  are  built  by  applying  the  data  type 
constructors  (array,  record,  set,  etc.)  to  the  primitive  data 
types  (real,  integer,  char,  etc.).  This  results  in  hierarchical 
structures.  Similarly,  control  flows  may  be  organized  by  apply- 
ing the  control  flow  constructors  ('sequence',  'if,'  and  'while') 
to  the  control  flow  primitives  (those  constructs  that  do  not 
alter  the  control  flow) . 


The  hierarchical  application  of  constructors  to  primitives 
is  the  most  common  method  of  building  structures.  Thus,  we  can 
use  this  as  a  starting  point  for  our  analysis  of  structures.  For 
instance,  as  a  first  approximation,  we  can  compare  the  complexity 
of  structures  of  two  programming  languages  by  comparing  the 
number  of  primitives  and  constructors  in  each.  For  instance,  we 
can  see  from  Table  1  that  Pascal  has  5  primitive  data  types  and  7 
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data  type  constructors 


TABLE  1.   Data  Structures 


Pascal 
5   primitives: 
7   constructors: 


Algol  -  <5_0 
3   primitives: 
1   constructor: 

Lisp  l.j> 
1   pr imi  ti ve  : 

I  constructor: 

Algol  -  ^8 

II  pr imi  t i ves : 

6   constructors 


real,  integer,  Boolean,  char,  text 
subrange,  enumeration,  set,  array,  file, 
poi  nt er ,  record . 

real,  integer,  Boolean. 
array. 


atom 
list 


int,  real,  bool,  char,  format,  compl ,  bits, 

bytes,  string,  sema,  file 
long,  ref,  array,  struct,  union,  proc 


Since  Algol-SO  has  3  primitives  and  1  constructor,  it  is  probably 
simpler  than  Pascal.  Conversely,  since  Algol-^S  has  11  primi- 
tives and  5  constructors  it  is  likely  to  be  more  complex.  How- 
ever, the  number  of  primitives  and  constructors  is  not  the  entire 
story. 

A  significant  aspect  of  the  structuring  mechanisms   provided 
by   a  language  is  the  complexity  of  the  inter-relationships  among 
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the  primitives  and  constructors.  For  instance,  if  the  output  of 
every  constructor  is  a  legitimate  input  to  every  constructor,  and 
every  primitive  is  a  legitimate  input  to  every  constructor,  then 
the  system  will  be  more  regular  than  if  this  is  not  the  case. 
This  is  often  called  'orthogonality'.  It  is  also  part  of  what  is 
involved  when  we  call  a  language  'structured'.  In  the  next  sec- 
tion we  will  develop  means  for  analyzing  these  relationships. 

3 .   Data  Structures 
3.1   Semantic  Grammars 


We  will  begin  with  data  structures  to  illustrate  our  tech- 
nique for  analyzing  structure.  Our  goal  is  to  analyze  the 
interrelationships  among  the  primitives  and  constructors  of  a 
system  of  data  structures.  How  are  we  to  go  about  this?  We  can 
begin  by  looking  at  syntax  because,  in  most  languages,  there  is  a 
close  relation  between  the  syntax  and  the  structures  it  embodies 
(i.e.,  form  follows  function).  In  particular,  there  will  usually 
be  exactly  one  syntactic  construct  for  each  data  primitive.  Con- 
sider Pascal.  We  can  see  from  Table  1  that  the  primitives  are 
denoted  by  the  predefined  type  identifiers,  'integer',  'Boolean', 
'real',  'char'  and  'text'.  There  are  constructors  for  enumera- 
tions, subranges,  sets,  arrays,  records,  files  and  pointers.  We 
know  that  these  are  constructors  because  each  can  generate  a 
potentially  unlimited  number  of  structures  (types).  Since  the 
Pascal  grammar  tells  us  what  syntactic  entities  can  go  together 
this  will  be  a  big  help  in  deciding  what  semantic  entities  can  go 
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together . 

Consider  the  array  type.   We  can  write  its  syntax  as 

array-type  ::=   array  [   index-type  ,...  ]  of  type 

The  index-type  must  be  a  type  isomorphic  to   a   subrange   of   the 
integers.   Syntactically ,  this  can  take  the  form: 

index-type:   scalar-type  |  subrange-type  !  type-identifier 

scalar-type:   (  identifier  ,...) 

subrange-type:   constant  ..  constant 

What  we  are  interested  in,  however,  is  the  semant  ics  of  the  array 
constructor.  Since  we  know  that  the  index  type  must  be  iso- 
morphic to  a  subrange  of  the  integers,  we  know  that  the  type- 
identifier  must  either  name  a  scalar-type  or  a  subrange-type  or 
one  of  the  predefined  finite  discrete-types,  Boolean  and  char. 
Also,  a  subrange  must  be  constructed  from  a  d  i  screte  constant 
(i.e.,  an  integer,  or  an  element  of  a  scalar  or  finite  discrete 
type).   We  can  write  this  as  a  "semantics-oriented  grammar": 

array-type:   array  [  index-type  ,...  ]  of  type 
index-type:   scalar-type  !  subrange-type  |  discrete-type 
scalar-type:   (  identifier  ,...) 
subrange-type:   constant  ..  constant 
discrete-type:   Boolean  !  char 

One  further  simplification  can  be  made  here.   Recall  that  in  Das- 
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cal 

array  [  i,  j  ]  of  t 
is  just  an  abbreviation  for 

array  [i]  of  array  [j]  of  t 

Thus,  without  loss  of  generality,  the  definition  of  array-type 
can  be  written 

array-type:   array  [  index-type  ]  of  type 

We  have  not  altered  the  syntax;  we  have  just  eliminated  some  syn- 
tactic sugar.  The  semantics  of  most  of  the  rest  of  Pascal's  con- 
structors closely  follows  their  syntax. 

If  we  are  to  be  able  to  compare  structures  in  different 
languages,  we  must  obviously  ignore  any  syntactic  differences 
that  exist  between  them.  This  we  can  do  by  writing  the  grammar 
in  a  neutral,  functional  form.   For  instance,  for  arrays: 

array-type:   array  (index-type,  type) 

index-type:   scalar-type  |  subrange-type  I  discrete-type 

scalar-type:   scalar  (  identifier*  ) 

subrange-type:   subrange  (constant,  constant) 

discrete-type:   Boolean  |  char 

3 .  2   Interpretation 

Now,  let  us  make  some  observations  about  these  rules.  Con- 
sider a  typical  string  generated  by  this  grammar: 
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array  (char,  array  (Boolean,  real  )) 

This  string  describes  a  particular  Pascal  data  type.  Now  suppose 
BOOLEAN  =  {  true,  false  }  is  the  set  of  all  Boolean  values  and 
REAL  is  the  set  of  all  real  values.  Then,  the  set  of  all  arrays 
with  Boolean  indices  and  real  elements  in  just  the  set  of  func- 
tions napping  BOOLEAN  into  REAL:  [  BOOLEAN  ->  REAL  1 .  There- 
fore, we  can  see  that  the  string  shown  above  describes  the  set  of 
data  values: 

[  CHAR  ->  [  BOOLEAN  ->  REAL  ]  ] 

This  suggests  that  we  can  define  an  interpretation  function , 
I,  that  associates  a  set  of  data  values  with  each  string  gen- 
erated by  the  grammar.   This  can  be  defined  recursively: 


array  (t,  f)  ]  =  [  I[t]  ->  I[t*  ]  ] 
scalar  (i1,...,in)  ]  =  {  i1,...,in  } 
subrange  (C,  C)  ]  =  {  x  I  C<x  s  x<.C  } 
Boolean  ]  =  BOOLEAN 
char  ]  =  CHAR 
[  real  ]  =  REAL 


To  make  this  interpretation  more  obvious,  we  will  write   subrange 


(C,  C)  as  C..C,  and  scala  r  (ij,...,i  )  as  {  i 


!,'••• 


'  In 


) 


Fia- 


ure  1  shows  the  complete  Pascal  type  system  using   these   conven- 
t  ions . 


Defining  the  interpretation  for  record-type  and  pointer-type 
s   quite   complicated   without   the   notations   of   a  relational 
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type:   simple-type  I  structured-type  |  pointer-type 

simple-type:   index-type  !  integer  I  real 

index-type:   scalar-type  I  subrange-type  I  discrete-type 

scalar-type:   {  identifier  +  } 

subrange-type:   constant  ..  constant 

discrete-type:   Boolean  I  char 

structured-type:   [packed]  unpacked-structured-type 

unpacked-structured-type :   array-type  I  record-type  I  set-type 

f i le-type 
array-type:   array  (index-type,  type) 
record-type:   record  ([field*])  [variant-part] 
field:   field  (identifier,  type) 
variant-part:   field  (identifier,  index-type) 

X  (constant  X  record-type)* 
set-type:   set  (index-type) 
file-type:   file  (type) 
pointer-type:   pointer  (type) 


Figure  1.   The  Pascal  Type  System 


calculus,  so  they  will  not  be  shown  here.   The  interpretation   of 
set  and  file  types  are  easy  to  define: 

I  [  set  (t)  ]  =  P  (  I[t]  ) 
I  [  file  (t)  ]  =  I[t]* 

where  P  is  the  power-set  function. 

It  should  be  noted  that  the  above  equations  imply  structural 
equivalence  of  Pascal  types,  as  opposed  to  name  equivalence .  The 
Revised  Report  on  Pascal  [4]  does  not  define  the  form  of  type 
equivalence  used.  It  is  simple  to  alter  the  above  definitions  to 
accommodate  name  equivalence;  we  just  represent  each  type  by  a 
pair  where  the  first  element  of  the  pair  is  the  type's  identifier 
and  the  second  element  of  the  pair  is  the  type  in  the  structural 
sense.   Thus  we  have, 

type:   identifier  X  unnamed-type 

unnamed-type:   simple-type  |  structured-type  I  pointer-type 
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It  should  be  pointed  out  that  there  are  limitations  to  the 
descriptive  power  of  this  notation.  For  instance,  it  does  not 
express  the  fact  that  the  identifiers  in  scalar-types  must  be 
distinct,  or  that  type  identifiers  must  he  distinct,  etc.  To 
include  all  this  information  would  clutter  the  notation  to  the 
point  of  unusability. 

4 .   Structure  Diagrams 

We  have  said  that  the  complexity  of  a  collection  of  struc- 
tures is  reflected  by  the  complexity  of  the  semantic  grammar.  It 
is  still  a  little  difficult  to  see  this  complexity  in  the  tradi- 
tional BNF  form.  For  this  purpose  we  have  found  a  diagrammatic 
form  enlightening.  This  is  really  a  dependency  graph  (showing 
which  nonterminals  depend  on  which  others)  coupled  with  special 
symbols  for  various  operations,  viz. 


-€>*— a 


3 


A* 
A+ 
AXB 

A  I  B  I  C 

[A|B] 


where  [A'Bj  means  either  A  or  3  or  nothing. 


In  our  semantic  grammars  (as  in  syntactic   grammars)   common 
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structural  patterns  are  factored  out  and  given  names.  This 
reflects  the  fact  that  these  structural  patterns  only  have  to  be 
learned  once.  In  the  structure  diagrams  this  factoring  is 
represented  by  an  edge  that  forks  and  goes  to  each  of  the  uses  of 
that  structure.  For  example,  since  'index-type'  is  used  both  as 
a  part  of  'discrete-type'  and  as  a  part  of  array  and  set  types, 
the  edge  from  index  type  goes  to  the  subgraphs  defining  each  of 
these  structures.  We  have  adopted  the  convention  of  only  using 
binary  forks;  since  edges  represent  dependencies,  this  simplifies 
complexity  estimation  by  edge  counting. 


Structures  from  other  systems  are  represented  by  T-shaped 
terminations.  Given  this  explanation,  the  reader  is  encouraged 
to  compare  the  diagram  of  Pascal's  data  structures  in  Figure  2 
with  the  semantic  grammar  in  Figure  1.  The  data  structures  of 
LISP,  Algol-60,  and  Algol-68  are  diagrammed  in  Figures  3-5. 
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.gure   2.      The   Pascal    Type   System 
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Figure  3-   The  LISP  Type  System 
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Figure  4.   The  Algol-60  Type  System 
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Figure  5-   The  Aigol-68  Type  System 
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5.   Name  Structures 

Next,  we  will  demonstrate  the  application  of  these  tech- 
niques to  the  name  structures,  another  subsystem  of  programming 
languages.  The  name  structures  of  programming  languages  are 
often  described  by  terms  such  as  "block-structured",  "monol- 
ithic", "disjoint",  etc.  To  get  a  better  grasp  on  these  struc- 
turing techniques  we  must  ask,  "What  is  being  structured?''  To 
put  it  more  precisely,  "What  relation  or  relations  are  being  con- 
trolled by  the  structuring  mechanisms  in  question?" 


For  name  structures  this  relation  is  visibility,  that  is, 
the  relation  that  holds  between  a  binding  and  a  use  of  an  iden- 
tifier when  that  use  can  refer  to  that  binding.  Thus,  the  pr  imi - 
ti ves  from  which  names  structures  are  assembled  are  bindings  and 
uses  of  identifiers,  and  the  constructors  used  to  assemble  these 
structures  are  mechanisms  such  as  block  structure. 

How  can  we  abstract  the  name  structures  from  a  programming 
language?  Again,  we  can  use  syntax  as  a  guide.  In  Figure  6  we 
show  the  fragments  of  Algol-60  syntax  relevant  to  visibility. 
Irrelevant  parts  of  the  syntax  have  been  elided.  Each  string 
generated  by  this  grammar  (ignoring  reordering  of  declarations, 
etc.)  defines  a  unique  name  structure,  i.e.,  structural  arrange- 
ment of  visibility  relations.  In  Figure  7  we  have  formulated  a 
semantics  oriented  grammar  for  these  relations. 
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<identifier>  ::=  .... 

<block>  ::=  <block  head>;  <compound  tail> 

<block  head>  ::=  begin  <declarat ion>  I  <block  head>;  <declaration 

<compound  tail>  ::=  <statement>  end 

I  <statenent>;  <compound  tail> 
<program>  ::=  <block>  I  <compound  statement> 
<procedure  declaration>  ::=  [<type>]  procedure 

<proc .heading>  <proc.body> 
<proc.heading>  :  :=  <proc.  identifier>  <formal  par.part>; 
<formal  par.part>  ::=  (  <identifier>  ,...  ) 
<decla rat ion>  ::=  <proc.decl.>  I  <other  decl.> 

Figure  (5.   A  Fragment  of  Algol-^0 


program:   executable 

block:   scope  (declaration"1",  executable) 

declaration:   simple-decl  |  proc-decl 

proc.decl:   identifier  X  scope  (s inple-decl* ,  executable) 

simple-decl:   identifier 

executable:   {identifier  I  block}* 

Figure  7.   The  Algol-60  Name  System 


Notice  that,  from  the  visibility  standpoint,  a  procedure  declara- 
tion is  the  same  as  a  block;  they  both  bind  local  identifiers  and 
delimit  a  scope.  Figure  8  shows  the  Algol-50  name  system  in 
diagrammatic  form.  The  following  figures  (9-11)  show  the  name 
systems  of  the  lambda  ca Iculus ,  FORTRAN  and  Pascal. 
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Figure  10.   The  Pascal  Name  Syste 
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Figure  1 1 .   The  FORTRAN  Name  System 


In  the  latter  case  (Pascal),  note  that  we  have  analyzed  the 
record  declaration  as  a  scope  defining  (or  name  grouping)  con- 
structor. Figure  12  compares  the  complexities  (as  measured  by 
edge-count)   of  these  name  systems  along  with  the  complexities  of 
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their  type  systems. 
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Figure  12.   Complexities  of  Name  and  Type  Systems 


6.   Control  Structures 


Control  structures  are  analyzed  in  the  same  way  as  the  other 
structures.  These  are  reflected  in  the  equations  and  structure 
diagrams  shown  in  Figures  13-16. 
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Figure  13-   Pascal  Control  Structures 
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Figure  16.   BASIC  Control  Structures 


Consider  Pascal;  the  relevant  parts  of  the  grammar  are  shown 
in  Figure  17.  These  diagrams  are  somewhat  deceptive  because  thev 
do  not  reflect  the  extraordinary  complexity  introduced  into  the 
control  structures  by  the  goto  statement.  An  analogous  complex 
ity  is  caused  in  data  structures  by  the  pointer  construct.  Thes< 
are  both  examples  of  non-local  references,  whose  proper  treatment 
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simple-statement:   assign-stat  !  proc-stat  I  goto-stat  |  empty 

assign-stat :   expr 

f unct ion-desig :   call  (fid,  exprlist) 

exprlist:   expr* 

expr:  f unct ion-desig* 

proc-stat:   call  (fid,  (expr  1  fid}*) 

goto-stat:   goto  (label) 

statement:   [label]  x  unlab-stat 

unlab-stat:   simple-statement  I  struc-stat 

struc-stat:   comp-stat  |  cond-stat  !  rep-stat  |  with-stat 

comp-stat:   statement"*" 

cond-stat:  if-stat  I  case-stat 

if-stat:   if  (expr,  stat,  [stat]) 

case-stat:   case  (expr,  case-list-element  ) 

case-list-element:   const   x  statement 

rep-stat:   while-stat  !  repeat-stat  1  for-stat 

while-stat:   while  (expr,  stat) 

rep-stat:   rep  (stat+,  expr) 

for-stat:   for  (id,  forlist,  stat) 

forlist:   expr  x  [down]  x  expr 

with-stat:   with  (expr+,  stat) 

Figure  17.   Pascal  Control  Structure  ^rammmar. 


remains  an   open  question. 
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7 .   Conclusions 

The  techniques  we  have  described  provide  a  simple,  visua] 
method  of  comparing  the  structuring  methods  provided  by  program- 
ming  languages.  Languages  can  often  be  ranked  as  to  their  strucJ 
tural  complexity  by  comparing  the  complexity  of  their  structural 
grammars  or  structure  diagrams.  In  addition,  the  diagrams  alio* 
the  language  designer  to  appraise  the  regularity  or  irregularity 
of  a  structural  subsystem  and  to  identify  areas  where  they  can  b« 
simpl i  f ied . 

Of  course,  it  is  very  desirable  to  be  able  to  quantify  thes( 
ideas,  and  there  are  many  approaches  to  this  quantification.  0n< 
of  the  simplest,  which  was  used  in  this  paper,  was  to  count  th< 
number  of  edges  in  the  graph,  since  this  reflects  the  dependen 
cies  within  the  system.  In  the  cases  we  have  investigated,  this 
metric  agrees  with  our  informal  evaluation. 

These  are,  of  course,  other  graph  theoretic  measures  tha 
can  be  applied,  for  instance,  variants  of  McCabe's  Cyclomat it 
Number  [3],  although  which  is  the  best  remains  an  open  question 
It  is  also  possible  to  apply  the  measures  of  Halstead's  "Softwar< 
Science"  [1]  to  either  the  structural  grammar  or  the  structun 
diagrams.  This  has  also  been  tried,  but  this  work  is  still  ii 
progress  [  2] . 

Although  the  proper  measure  to  be  applied  remains  an  opei 
problem,  the  representation  of  structures  in  a  measurable  form 
such   as   the   structure   diagrams,   is   a   first   step    toward; 
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development  of  these  metrics.  Future  research  will  attempt  to 
refine  the  analysis  of  structures  and  their  representation  as 
graphs,  and  will  attempt  to  develop  appropriate  measures  of  their 
complexi  ty . 
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